NLG
Web Applications
Predictive Analytics
Time Series
narrator is a template-based NLG system that produces written narratives from a data set.
There are several approaches for creating text from data, two most used ones are:
There are many packages that accommodate different types of templates in R, but glue is a part of tidyverse and very easy to use. Simple template can look like:
Temperatures on {date} will reach the max of {max_temp} at {max_hour}.
When adding the actual variable values we get the text:
To create narratives for reports, application convert it using to_html(), you can also add formatting to the numbers with format_numbers = TRUE in any narrate function.
sales %>%
narrate_descriptive(measure = "Sales",
dimensions = "Product",
format_numbers = TRUE,
coverage = 0.8) %>%
to_html()sales %>%
narrate_descriptive(measure = "Sales",
dimensions = "Product",
format_numbers = TRUE,
coverage = 0.8,
coverage_limit = 3) %>%
to_html()narrator works with both aggregated and non-aggregated data, one of the key features to make sure the narratives are correct is to use the right summarization option. By default it uses sum, but you can alternatively use average or count
For a template-based system it is very useful to be able to change certain template and make output more flexible.
$`Total Order ID`
Order ID Volume across all Products is equal to 10000.
$`Product by Order ID`
Outlying Products by Order ID are Food & Beverage (3552, 35.5 %), Electronics (1975, 19.8 %).
Variables available for narrative generation can be accessed using return_data = TRUE argument in all narrate functions.
To see all available templates at once use list_templates() function.
| fun | name | template |
|---|---|---|
| narrate_descriptive | template_total | Total {measure} across all {pluralize(dimension_one)} is {total}. |
| narrate_descriptive | template_average | Average {measure} across all {pluralize(dimension_one)} is {total}. |
| narrate_descriptive | template_outlier | Outlying {dimension} by {measure} is {outlier_insight}. |
| narrate_descriptive | template_outlier_multiple | Outlying {pluralize(dimension)} by {measure} are {outlier_insight}. |
| narrate_descriptive | template_outlier_l2 | In {level_l1}, significant {level_l2} by {measure} is {outlier_insight}. |
| narrate_descriptive | template_outlier_l2_multiple | In {level_l1}, significant {pluralize(level_l2)} by {measure} are {outlier_insight}. |
Great way to instantly generate insights around the development of certain metrics in time is creating so called trend narratives with narrate_trend() function. Let’s create a dataset with dates:
data <- sales %>%
dplyr::mutate(Date = lubridate::floor_date(Date, unit = "month")) %>%
dplyr::group_by(Region, Product, Date) %>%
dplyr::summarise(Sales = sum(Sales, na.rm = TRUE))
data %>%
dplyr::ungroup() %>%
dplyr::slice(1:8) %>%
reactable::reactable(bordered = TRUE, striped = TRUE)Basic trend narrative analyzes the data year-over-year, narrator requires to have a date/datetime stamps for creating these
narrator can use ChatGPT API to improve your narratives. To do so you can either set use_chatgpt = TRUE in any function that creates narrative or use enhance_narrative() to improve existing narrative output. You can supply list or character, function will collapse all text into a sentence and send a request to Chat GPT. Set your token in .Renviron file as OPENAI_API_KEY or supply it to a function as openai_api_key argument.
This functionality requires you to setup the ChatGPT API key and make it accessible from R.
narrative <- sales %>%
narrate_descriptive(
measure = "Sales",
dimensions = c("Region", "Product"),
use_chatgpt = TRUE
)
cat(narrative)The Total Sales of our company amount to $38,790,478.4, contributing to our success across all regions. However, our Outlying Regions stand out, with impressive Sales figures of $18,079,736.4, constituting 46.6% of the total Sales, followed by EMEA, with Sales of $13,555,412.7, comprising 34.9%. In our Outlying Region, Food & Beverage and Electronics have emerged as noteworthy Products, contributing $7,392,821 (40.9%) and $3,789,132.7 (21%) respectively, towards the impressive Sales figures. Similarly, in EMEA, Food & Beverage and Electronics have emerged as significant Products, contributing $5,265,113.2 (38.8%) and $3,182,803.4 (23.5%) respectively. Lastly, Food & Beverage and Electronics have been the Outlying Products driving our Sales, with a total contribution of $15,543,469.7 (40.1%) and $8,608,962.8 (22.2%) respectively.
Translate you text using translate_narrative() function, specify language argument in English:
Celkové tržby naší společnosti činí 38 790 478,4 dolarů a přispívají k našemu úspěchu ve všech oblastech. Významně se však vynořují výsledky pro Naše Okrajové oblasti, s impozantním prodejem ve výši 18 079 736,4 dolarů, což představuje 46,6 % z celkových prodejů. Následuje oblast EMEA s prodejem 13 555 412,7 dolarů, což znamená 34,9 %. V Našich Okrajových oblastech vynikly produkty Výživa a Nápoje a Elektronika, které přispěly impozantními prodejními výsledky ve výši 7 392 821 dolarů (40,9 %) a 3 789 132,7 dolarů (21 %) z celkových prodejů. Podobně ve v oblasti EMEA, Výživa a Nápoje a Elektronika vynikly jako významné produkty, které přispěly 5 265 113,2 dolarů (38,8 %) a 3 182 803,4 dolarů (23,5 %) k celkovým prodejům. Nakonec je Výživa a Nápoje a Elektronika produkty, které vedou prodeje v Našich Okrajových oblastech, a to s celkovým přínosem 15 543 469,7 dolarů (40,1 %) a 8 608 962,8 dolarů (22,2 %) z celkových prodejů.
If your output is too verbose you can summarize it with summarize_narrative() function:
Our company’s total sales are $38,790,478.4, with our Outlying Regions contributing the most at 46.6%. These regions mainly sell Food & Beverage and Electronics, which together make up 40.9% of sales. Similarly, EMEA sells these products the most, accounting for 38.8% of sales. Overall, Food & Beverage and Electronics are the products driving our sales.
Ehud Reiter’s Blog NLG Professor and Chief Scientist for Arria NLG